SarsaLandmark: an algorithm for learning in POMDPs with landmarks
نویسندگان
چکیده
Reinforcement learning algorithms that use eligibility traces, such as Sarsa(λ), have been empirically shown to be effective in learning good estimated-state-based policies in partially observable Markov decision processes (POMDPs). Nevertheless, one can construct counterexamples, problems in which Sarsa(λ < 1 ) fails to find a good policy even though one exists. Despite this, these algorithms remain of great interest because alternative approaches to learning in POMDPs based on approximating belief-states do not scale. In this paper we present SarsaLandmark, an algorithm for learning in POMDPs with ”landmark” states (most man-made and many natural environments have landmarks). SarsaLandmark simultaneously preserves the advantages offered by eligibility traces and fixes the cause of the failure of Sarsa(λ) on the motivating counterexamples. We present a theoretical analysis of SarsaLandmark for the policy evaluation problem and present empirical results on a few learning control problems.
منابع مشابه
Stochastic Shortest Path with Energy Constraints in POMDPs
We extend the traditional framework of POMDPs to model resource consumption inducing a hard constraint on the behaviour of the model. Resource levels increase and decrease with transitions, and the hard constraint requires that the level remains positive in all steps. We present an algorithm for solving POMDPs with resource levels, developing on existing POMDP solvers. Our second contribution i...
متن کاملLearning Sorting and Decision Trees with POMDPs
pomdps are general models of sequential decisions in which both actions and observations can be probabilistic. Many problems of interest can be formulated as pomdps, yet the use of pomdps has been limited by the lack of eeective algorithms. Recently this has started to change and a number of problems such as robot navigation and planning are beginning to be formulated and solved as pomdps. The ...
متن کاملA Cultural Algorithm for POMDPs from Stochastic Inventory Control
Reinforcement Learning algorithms such as SARSA with an eligibility trace, and Evolutionary Computation methods such as genetic algorithms, are competing approaches to solving Partially Observable Markov Decision Processes (POMDPs) which occur in many fields of Artificial Intelligence. A powerful form of evolutionary algorithm that has not previously been applied to POMDPs is the cultural algor...
متن کاملLearning the Reward Model of Dialogue POMDPs from Data
Spoken language communication between human and machines has become a challenge in research and technology. In particular, enabling the health care robots with spoken language interface is of great attention. Recently due to uncertainty characterizing dialogues, there has been interest for modelling the dialogue manager of spoken dialogue systems using Partially Observable Markov Decision Proce...
متن کاملRepresenting hierarchical POMDPs as DBNs for multi-scale map learning
We explore the advantages of representing hierarchical partially observable Markov decision processes (H-POMDPs) as dynamic Bayesian networks (DBNs). We use this model for representing and learning multi-resolution spatial maps for indoor robot navigation. Our results show that a DBN representation of H-POMDPs can train significantly faster than the original learning algorithm for H-POMDPs or t...
متن کامل